Principal component analysis

We first examine whether the features are globally correlated with the response variable, forest cover type. Outliers have been removed from PCA plots to avoid excessive skew.

plot of chunk pcaplot of chunk pcaplot of chunk pcaplot of chunk pca

Correlative features

To identify features that correlate with the forest cover type, we correlate each feature with each forest cover type.

plot of chunk cors

The heatmap reveals that elevation, some wilderness areas, and some soil types are correlated with some forest cover types.

Conditional inference trees

Finally, we illustrate how each candidate feature can help predict forest cover type using decision trees.

plot of chunk tree_elevation

plot of chunk tree_wilderness

plot of chunk tree_soil_type